169 research outputs found
SA-Net: Deep Neural Network for Robot Trajectory Recognition from RGB-D Streams
Learning from demonstration (LfD) and imitation learning offer new paradigms
for transferring task behavior to robots. A class of methods that enable such
online learning require the robot to observe the task being performed and
decompose the sensed streaming data into sequences of state-action pairs, which
are then input to the methods. Thus, recognizing the state-action pairs
correctly and quickly in sensed data is a crucial prerequisite for these
methods. We present SA-Net a deep neural network architecture that recognizes
state-action pairs from RGB-D data streams. SA-Net performed well in two
diverse robotic applications of LfD -- one involving mobile ground robots and
another involving a robotic manipulator -- which demonstrates that the
architecture generalizes well to differing contexts. Comprehensive evaluations
including deployment on a physical robot show that \sanet{} significantly
improves on the accuracy of the previous method that utilizes traditional image
processing and segmentation.Comment: (in press
A Novel Variational Lower Bound for Inverse Reinforcement Learning
Inverse reinforcement learning (IRL) seeks to learn the reward function from
expert trajectories, to understand the task for imitation or collaboration
thereby removing the need for manual reward engineering. However, IRL in the
context of large, high-dimensional problems with unknown dynamics has been
particularly challenging. In this paper, we present a new Variational Lower
Bound for IRL (VLB-IRL), which is derived under the framework of a
probabilistic graphical model with an optimality node. Our method
simultaneously learns the reward function and policy under the learned reward
function by maximizing the lower bound, which is equivalent to minimizing the
reverse Kullback-Leibler divergence between an approximated distribution of
optimality given the reward function and the true distribution of optimality
given trajectories. This leads to a new IRL method that learns a valid reward
function such that the policy under the learned reward achieves expert-level
performance on several known domains. Importantly, the method outperforms the
existing state-of-the-art IRL algorithms on these domains by demonstrating
better reward from the learned policy
A Hierarchical Bayesian model for Inverse RL in Partially-Controlled Environments
Robots learning from observations in the real world using inverse
reinforcement learning (IRL) may encounter objects or agents in the
environment, other than the expert, that cause nuisance observations during the
demonstration. These confounding elements are typically removed in
fully-controlled environments such as virtual simulations or lab settings. When
complete removal is impossible the nuisance observations must be filtered out.
However, identifying the source of observations when large amounts of
observations are made is difficult. To address this, we present a hierarchical
Bayesian model that incorporates both the expert's and the confounding
elements' observations thereby explicitly modeling the diverse observations a
robot may receive. We extend an existing IRL algorithm originally designed to
work under partial occlusion of the expert to consider the diverse
observations. In a simulated robotic sorting domain containing both occlusion
and confounding elements, we demonstrate the model's effectiveness. In
particular, our technique outperforms several other comparative methods, second
only to having perfect knowledge of the subject's trajectory.Comment: 8 pages, 10 figure
- …